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(54) Method and system for inserting a spread spectrum watermark into multimedia data 



(57) Digital watermarking of audio, image, video or 
multimedia data is achieved by inserting the watermark 
into the perceptually significant components of a 
decomposition of the data in a manner so as to be visu- 
ally imperceptible. In a preferred method, a frequency 
spectral image of the data, preferably a Fourier trans- 
form of the data, is obtained. A watermark is inserted 
into perceptually significant components of the fre- 
quency spectral image. The resultant watermarked 



spectral image is subjected to an inverse transform to 
produce watermarked data. The watermark is extracted 
from watermarked data by first comparing the water- 
marked data with the original data to obtain an extracted 
watermark. Then, the original watermark, original data 
and the extracted watermark are compared to generate 
a watermark which is analyzed for authenticity of the 
watermark. 
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Description 

The present invention concerns a method of digital 
watermarking for use in audio, image, video and multi- 
media data for the purpose of authenticating copyright 
ownership, identifying copyright infringers or transmit- 
ting a hidden message. Specifically, a watermark is 
inserted into the perceptually most significant compo- 
nents of a decomposition of the data in a manner so as 
to be virtually imperceptible. More specifically, a narrow 
band signal representing the watermark is placed in a 
wideband channel that is the data. 

The proliferation of digitized media such as audio, 
image and video is creating a need for a security system 
which facilitates the identification of the source of the 
material. The need manifests itself in terms of copyright 
enforcement and identification of the source of the 
material. 

Using conventional cryptographic systems permits 
only valid keyholder access to encrypted data, but once 
the data is encrypted, it is not possible to maintain 
records of its subsequent representation or transmis- 
sion. Conventional cryptography therefore provides 
minimal protection against data piracy of the type a pub- 
lisher or owner of data or material is confronted with by 
unauthorized reproduction or distribution of such data or 
material. 

A digital watermark is intended to complement 
cryptographic processes. The watermark is a visible or 
preferably an invisible identification code that is perma- 
nently embedded in the data. That is, the watermark 
remains with the data after any decryption process. As 
used herein the terms data and material will be under- 
stood to refer to audio (speech and music), images 
(photographs and graphics), video (movies or 
sequences of images) and multimedia data (combina- 
tions of the above categories of materials) or processed 
or compressed versions thereof. These terms are not 
intended to refer to ASCII representations of text, but do 
refer to text represented as an image. A simple example 
of a watermark is a visible "seal" placed over an image 
to identify the copyright owner. However, the watermark 
might also contain additional information, including the 
identity of the purchaser of the particular copy of the 
image. An effective watermark should possess the fol- 
lowing properties: 

1 . The watermark should be perceptually invisible 
or its presence should not interfere with the material 
being protected. 

2. The watermark must be difficult (preferably virtu- 
ally impossible) to remove from the material without 
rendering the material useless for its intended pur- 
pose. However, if only partial knowledge is known, 
e.g. the exact location of the watermark within an 
image is unknown, then attempts to remove or 
destroy the watermark, for instance by adding 
noise, should result in severe degradation in data 
fidelity, rendering the data useless, before the 



watermark is removed or lost. 
3. The watermark should be robust against collu- 
sion by multiple individuals who each possess a 
watermarked copy of the data. That is, the water- 
5 mark should be robust to the combining of copies of 

the same data set to destroy the watermarks. Also, 
it must not be possible for colluders to combine 
each of their images to generate a different valid 
watermark. 

io 4. The watermark should still be retrievable if com- 
mon signal processing operations are applied to the 
data. These operations include, but are not limited 
to digital-to-analog and analog-to-digital conver- 
sion, resampling, requantization (including dither- 

15 ing and recompression) and common signal 
enhancements to image contrast and color, or 
audio bass and treble for example. The watermarks 
in image and video data should be immune from 
geometric image operations such as rotation, trans- 

20 lation, cropping and scaling. 

5. The same digital watermark method or algorithm 
should be applicable to each of the different media 
under consideration. This is particularly useful in 
watermarking of multimedia material. Moreover, 

25 this feature is conducive to the implementation of 
video and image/video watermarking using com- 
mon hardware. 

6. Retrieval of the watermark should unambigu- 
ously identify the owner. Moreover, the accuracy of 

30 the owner identification should degrade gracefully 
during attack. 

Several previous digital watermarking methods 
have been proposed. L. F. Turner in patent number 

35 WO89/08915 entitled "Digital Data Security System" 
proposed a method for inserting an identification string 
into a digital audio signal by substituting the "insignifi- 
cant" bits of randomly selected audio samples with the 
bits of an identification code. Bits are deemed "insignif i- 

40 cant" if their alteration is inaudible. Such a system is 
also appropriate for two dimensional data such as 
images, as discussed in an article by R.G. Van Schyn- 
del et al entitled "A digital watermark" in Intl. Conf. on 
Image Processing, vol 2, Pages 86-90, 1994. The 

45 Turner method may easily be circumvented. For exam- 
ple, if it is known that the algorithm only affects the least 
significant two bits of a word, then it is possible to ran- 
domly flip all such bits, thereby destroying any existing 
identification code. 

so An article entitled "Assuring Ownership Rights for 
Digital Images" by G. Caronni, in Proc. Reliable IT Sys- 
tems. VIS "95, 1995 suggests adding tags - small geo- 
metric patterns-to-digitized images at brightness levels 
that are imperceptible. While the idea of hiding a spatial 

55 watermark in an image is fundamentally sound, this 
scheme is susceptible to attack by filtering and redigiti- 
zation. The fainter such watermarks are, the more sus- 
ceptible they are to such attacks and geometric shapes 
provide only a limited alphabet with which to encode 
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information. Moreover, the scheme is not applicable to 
audio data and may not be robust to common geometric 
distortions, especially cropping. 

J. Brassil et al in an article entitled "Electronic Mark- 
ing and Identification Techniques to Discourage Docu- 5 
ment Copying" in Proc. of Infocom 94, pp 1278-1287, 
1994 propose three methods appropriate for document 
images in which text is common. Digital watermarks are 
coded by: (l)verticafly shifting text lines, (2) horizontally 
shifting words, or (3) altering text features such as the w 
vertical endlines of individual characters. Unfortunately, 
all three proposals are easily defeated, as discussed by 
the authors. Moreover, these techniques are restricted 
exclusively to images containing text. 

An article by K. Tanaka et al entitled "Embedding 15 
Secret Information into a Dithered Multi-level Image" in 
IEEE Military Comm. Conf., pp21 6-220, 1990 and K. 
Mitsui et al in an article entitled "Video-Steganography" 
in IMA Intellectual Property Proc, vl, pp 187-206, 1994, 
describe several watermarking schemes that rely on 20 
embedding watermarks that resemble quantization 
noise. Their ideas hinge on the notion that quantization 
noise is typically imperceptible to viewers. Their first 
scheme injects a watermark into an image by using a 
predetermined data stream to guide level selection in a 25 
predictive quantizer. The data stream is chosen so that 
the resulting watermark looks like quantization noise. A 
variation of this scheme is also presented, where a 
watermark in the form of a dithering matrix is used to 
dither an image in a certain way. There are several 30 
drawbacks to these schemes. The most important is 
that they are susceptible to signal processing, espe- 
cially requantization, and geometric attacks such as 
cropping. Furthermore, they degrade an image in the 
same way that predictive coding and dithering can. 35 

In Tanaka et al, the authors also propose a scheme 
for watermarking facsimile data. This scheme shortens 
or lengthens certain runs of data in the run length code 
used to generate the coded fax image. This proposal is 
susceptible to digital-to-analog and analog-to digital 40 
conversions. In particular, randomizing the least signifi- 
cant bit (LSB) of each pixel's intensity will completely 
alter the resulting run length encoding. Tanaka et al also 
propose a watermarking method for "color-scaled pic- 
ture and video sequences". This method applies the 45 
same signal transform as JPEG (DCT of 8 x 8 sub- 
blocks of an image) and embeds a watermark in the 
coefficient quantization module. While being compatible 
with existing transform coders, this scheme is quite sus- 
ceptible to requantization and filtering and is equivalent so 
to coding the watermark in the least significant bits of 
the transform coefficients. 

In a recent paper, by Macq and Quisquater entitled 
"Cryptology for Digital TV Broadcasting" in Proc. of the 
IEEE, 83(6), pp944-957, 1995 there is briefly discussed 55 
the issue of watermarking digital images as part of a 
general survey on cryptography and digital television. 
The authors provide a description of a procedure to 
insert a watermark into the least significant bits of pixels 



located in the vicinity of image contours. Since it relies 
on modifications of the least significant bits, the water- 
mark is easily destroyed. Further, the method is only 
applicable to images in that it seeks to insert the water- 
mark into image regions that lie on the edge of contours. 

W. Bender et al in article entitled "Techniques for 
Data Hiding" in Proc. of SPIE, v2420, page 40, July 
1995, describe two watermarking schemes. The first is 
a statistical method called "Patchwork". Patchwork ran- 
domly chooses n pairs of image points (aj, bj) and 
increases the brightness at aj by one unit while corre- 
spondingly decreasing the brightness of bj. The 
expected value of the sum of the differences of the n 
pairs of points is claimed to be 2n, provided certain sta- 
tistical properties of the image are true. In particular, it is 
assumed that all brightness levels are equally likely, that 
is, intensities are uniformly distributed. However, in 
practice, this is very uncommon. Moreover, the scheme 
may not be robust to randomly jittering the intensity lev- 
els by a single unit, and be extremely sensitive to geo- 
metric affine transformations. 

The second method is called "texture block coding", 
where a region of random texture pattern found in the 
image is copied to an area of the image with similar tex- 
ture. Autocorrelation is then used to recover each tex- 
ture region. The most significant problem with this 
technique is that it is only appropriate for images that 
possess large areas of random texture. The technique 
could not be used on images of text, for example. Nor is 
there a direct analog for audio. 

In addition to direct work on watermarking images, 
there are several works of interest in related areas. E.H. 
Adelson in U.S. Patent No. 4, 939,515 entitled "Digital 
Signal Encoding and Decoding Apparatus" describes a 
technique for embedding digital information in an ana- 
log signal for the purpose of inserting digital data into an 
analog TV signal. The analog signal is quantized into 
one of two disjoint ranges ({0,2,4...}, {1,3,5}, for exam- 
ple) which are selected based on the binary digit to be 
transmitted. Thus Adelson's method is equivalent to 
watermark schemes that encode information into the 
least significant bits of the data or its transform coeffi- 
cients. Adelson recognizes that the method is suscepti- 
ble to noise and therefore proposes an alternative 
scheme wherein a 2x1 Hadamard transform of the digi- 
tized analog signal is taken. The differential coefficient 
of the Hadamard transform is offset by 0 or 1 unit prior 
to computing the inverse transform. This corresponds to 
encoding the watermark into the least significant bit of 
the differential coefficient of the Hadamard transform. It 
is not clear that this approach would demonstrate 
enhanced resilience to noise. Furthermore, like ail such 
least significant bit schemes, an attacker can eliminate 
the watermark by randomization. 

U.S. Patent No. 5,010,405 describes a method of 
interleaving a standard NTSC signal within an 
enhanced definition television (EDTV) signal. This is 
accomplished by analyzing the frequency spectrum of 
the EDTV signal (larger than that of the NTSC signal) 
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and decomposing it into three sub-bands (L.M.H for low, 
medium and high frequency respectively). In contrast, 
the NTSC signal is decomposed into two subbands, L 
and M. The coefficients, M k within the M band are 
quantized into M levels and the high frequency coeffi- 5 
cients, H k , of the EDTV signal are scaled such that the 
addition of the H k signal plus any noise present in the 
system is less than the minimum separation between 
quantization levels. Once more, the method relies on 
modifying least significant bits. Presumably, the mid- w 
range rather than low frequencies were chosen 
because they are less perceptually significant. In con- 
trast, the method proposed in the present invention 
modifies the most perceptually significant components 
of the signal. 15 

Finally, it should be noted that many, if not all, of the 
prior art protocols are not collusion resistant. 

Recently, Digimarc Corporation of Portland, Ore- 
gon, has described work referred to as signature tech- 
nology for use in identifying digital intellectual property. 20 
Their method adds or subtracts small random quantities 
from each pixels. Addition or subtraction is based on 
comparing a binary mask of N bits with the least signifi- 
cant bit (LSB) of each pixel. If the LSB is equal to the 
corresponding mask bit, then the random quantity is 25 
added, otherwise it is subtracted. The watermark is 
extracted by first computing the difference between the 
original and watermarked images and then by examin- 
ing the sign of the difference, pixel by pixel, to determine 
if it corresponds to the original sequence of addi- 30 
tions/subtractions. The Digimarc technique is not based 
on direct modifications of the image spectrum and does 
not make use of perceptual relevance. While the tech- 
nique appears to be robust, it may be susceptible to 
constant brightness offsets and to attacks based on 35 
exploiting the high degree of local correlation present in 
an image. For example, randomly switching the position 
of similar pixels within a local neighborhood may signifi- 
cantly degrade the watermark without damaging the 
image. 40 

In a paper by Koch, Rindfrey and Zhao entitled 
"Copyright Protection for Multimedia Data", two general 
methods for watermarking images are described. The 
first method partitions an image into 8x8 blocks of pixels 
and computes the Discrete Cosine Transform (DCT) of 45 
each of these blocks. A pseudorandom subset of the 
blocks is chosen and in each such block a triple of fre- 
quencies selected from one of 18 predetermined triples 
is modified so that their relative strengths encode a 1 or 
0 value. The 18 possible triples are composed by selec- so 
tion of three out of eight predetermined frequencies 
within the 8x 8 DCT block. The choice of the eight fre- 
quencies to be altered within the DCT block appears to 
be based on the belief that middle frequencies have a 
moderate variance level, i.e., they have similar magni- 55 
tude. This property is needed in order to allow the rela- 
tive strength of the frequency triples to be altered 
without requiring a modification that would be perceptu- 
ally noticeable. Unlike in the present invention, the set of 
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frequencies is not chosen based on any perceptual sig- 
nificance or relative energy considerations. In addition, 
because the variance between the eight frequency 
coefficients is small, one would expect that the tech- 
nique may be sensitive to noise or distortions. This is 
supported by the experimental results reported in the 
Koch et at paper, supra, where it is reported that the 
"embedded labels are robust against JPEG compres- 
sion for a quality factor as low as about 50%". In con- 
trast, the method described in accordance with the 
teachings of the present invention has been demon- 
strated with compression quality factors as low as 5 per- 
cent. 

An earlier proposal by Koch and Zhao in a paper 
entitled "Toward Robust and Hidden Image Copyright 
Labeling" proposed not triples of frequencies but pairs 
of frequencies and was again designed specifically for 
robustness to JPEG compression. Nevertheless, the 
report states that "a lower quality factor will increase the 
likelihood that the changes necessary to superimpose 
the embedded code on the signal will be noticeably vis- 
ible". 

In a second method, proposed by Koch and Zhao, 
designed for black and white images, no frequency 
transform is employed. Instead, the selected blocks are 
modified so that the relative frequency of white and 
black pixels encodes the final value. Both watermarking 
procedures are particularly vulnerable to multiple docu- 
ment attacks. To protect against this, Zhao and Koch 
proposed a distributed 8x8 block of pixels created by 
randomly sampling 64 pixels from the image. However, 
the resulting DCT has no relationship to that of the true 
image. Consequently, one would expect such distrib- 
uted blocks to be both sensitive to noise and likely to 
cause noticeable artifacts in the image. 

In summary, prior art digital watermarking tech- 
niques are not robust and the watermark is easy to 
remove. In addition, many prior techniques would not 
survive common signal and geometric distortions 

The present invention overcomes the limitations of 
the prior art methods by providing a watermarking sys- 
tem that embeds an unique identifier into the perceptu- 
ally significant components of a decomposition of an 
image, an audio signal or a video sequence. 

Preferably, the decomposition is a spectral fre- 
quency decomposition. The watermark is embedded in 
the data's perceptually significant frequency compo- 
nents. This is because an effective watermark cannot 
be located in perceptually insignificant regions of image 
data or in its frequency spectrum, since many common 
signal or geometric processes affect these components. 
For example, a watermark located in the high frequency 
spectral components of an image is easily removed, 
with minor degradation to the image, by a process that 
performs low pass filtering. The issue then becomes 
one of how to insert the watermark into the most signif- 
icant regions of the data frequency spectrum without 
the alteration being noticeable to an observer, i.e., a 
human or a machine feature recognition system. Any 
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spectral component may be altered, provided the alter- 
ation is small. However, very small alterations are sus- 
ceptible to any noise present or intentional distortion. 

In order to overcome this problem, the frequency 
domain of the image data or sound data may be consid- 
ered as a communication channel, and correspondingly 
the watermark may be considered as a signal transmit- 
ted through the channel. Attacks and intentional signal 
distortions are thus treated as noise from which the 
transmitted signal must be immune. Attacks are inten- 
tional efforts to remove, delete or otherwise overcome 
the beneficial aspects of the data watermarking. While 
the present invention is intended to embed watermarks 
in data, the same methodology can be applied to send- 
ing any type of message through media data. 

Instead of encoding the watermark into the least 
significant components of the data, the present inven- 
tion considers applying concepts of spread spectrum 
communication. In spread spectrum communications, a 
narrowband signal is transmitted over a much larger 
bandwidth such that the signal energy present in any 
single frequency is imperceptible. In a similar manner, 
the watermark is spread over many frequency bins so 
that the energy in any single bin is small and impercep- 
tible. Since the watermark verification process includes 
a priori knowledge of the locations and content of the 
watermarks, it is possible to concentrate these many 
weak signals into a single signal with a high signal to- 
noise ratio. Destruction of such a watermark would 
require noise of high amplitude to be added to every fre- 
quency bin. 

In accordance with the teachings of the present 
invention, a watermark is inserted into the perceptually 
most significant regions of the data decomposition. The 
watermark itself is designed to appear to be additive 
random noise and is spread throughout the image. By 
placing the watermark into the perceptually significant 
components, it is much more difficult for an attacker to 
add more noise to the components without adversely 
affecting the image or other data. It is the fact that the 
watermark looks like noise and is spread throughout the 
image or data which makes the present scheme appear 
to be similar to spread spectrum methods used in com- 
munications system. 

Spreading the watermark throughout the spectrum 
of an image ensures a large measure of security 
against unintentional or intentional attack. First, the 
location of the watermark is not obvious. Second, fre- 
quency regions are selected in a fashion that ensures 
severe degradation of the original data following any 
attack on the watermark. 

A watermark that is well placed in the frequency 
domain of an image or a sound track will be practically 
impossible to see or hear. This will always be the case if 
the energy in the watermark is sufficiently small in any 
single frequency coefficient. Moreover, it is possible to 
increase the energy present in particular frequencies by 
exploiting knowledge of masking phenomena in the 
human auditory and visual systems. Perceptual mask- 



ing refers to any situation where information in certain 
regions of an image or a sound is occluded by percep- 
tually more prominent information in another part of the 
image or sound. In digital waveform coding, this fre- 
5 quency domain (and in some cases, time/pixel domain) 
masking is exploited extensively to achieve low bit rate 
encoding of data. It is clear that both auditory and visual 
systems attach more resolution to the high energy, low 
frequency, spectral regions of an auditory or visual 
w scene. Further, spectrum analysis of images and 
sounds reveals that most of the information in such data 
is often located in the low frequency regions. 

In addition, particularly for processed or com- 
pressed data, perceptually significant need not refer to 
15 human perceptual significance, but may refer instead to 
machine perceptual significance, for instance, machine 
feature recognition. 

To meet these requirements, a watermark is pro- 
posed whose structure comprises a large quantity, for 
20 instance 1000, of randomly generated numbers with a 
normal distribution having zero mean and unity vari- 
ance. A binary watermark is not chosen because it is 
much less robust to attacks based on collusion of sev- 
eral independently watermarked copies of an image. 
25 However, generally, the watermark might have arbitrary 
structure, both deterministic and/or random, and includ- 
ing uniform distributions. The length of the proposed 
watermark is variable and can be adjusted to suit the 
characteristics of the data. For example, longer water- 
so marks might be used for images that are especially sen- 
sitive to large modifications of its spectral coefficients, 
thus requiring weaker scaling factors for individual com- 
ponents. 

The watermark is then placed in components of the 

35 image spectrum. These components may be chosen 
based on an analysis of those components which are 
most vulnerable to attack and/or which are most percep- 
tually significant. This ensures that the watermark 
remains with the image even after common signal and 

40 geometric distortions. Modification of these spectral 
components results in severe image degradation long 
before the watermark itself is destroyed. Of course, to 
insert the watermark, it is necessary to alter these very 
same coefficients. However, each modification can be 

45 extremely small and, in a manner similar to spread 
spectrum communication, a strong narrowband water- 
mark may be distributed over a much broader image 
(channel) spectrum. Conceptually, detection of the 
watermark then proceeds by adding all of these very 

so small signals, whose locations are only known to the 
copyright owner, and concentrating the watermark into 
a signal with high signal-to-noise ratio. Because the 
location of the watermark is only known to the copyright 
holder, an attacker would have to add very much more 

55 noise energy to each spectral coefficient in order to be 
confident of removing the watermark. However, this 
process would destroy the image. 

Preferably, a predetermined number of the largest 
coefficients of the DCT (discrete cosine transform) 
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(excluding the DC term) are used- However, the choice 
of the DCT is not critical to the algorithm and other 
spectral transforms, including wavelet type decomposi- 
tions are also possible. In fact, use of the FFT rather 
than DCT is preferable from a computational perspec- 5 
tive. 

The invention will be more clearly understood when 
the following description is read in conjunction with the 
accompanying drawing. 



Figure 1 is a schematic representation of 

typical common processing oper- 
ations to which data could be 
subjected; 

Figure 2 is a schematic representation of a 

preferred system for immersing a 
watermark into an image; 

Figures 3a and 3b are flow charts of the encoding 
and decoding of watermarks; 



Figure 4 



Figure 5 



Figure 6 



Figure 7 



is a schematic diagram of an opti- 
cal embodiment of the present 
invention 
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20 



is a graph of the responses of the 
watermark detector to random 
watermarks; 25 

is a graph of the response of the 
watermark detector to random 
watermarks for an image which is 
successively watermarked five 30 
times; 



is a graph of the response of the 
watermark detector to random 
watermarks where five images, 
each having a different water- 
mark, and averaged together; 
and 
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In order to better understand the advantages of the 
invention, the preferred embodiment of a frequency 45 
spectrum based watermarking system will be 
described. It is instructive to examine the processing 
stages that image (or sound) data may undergo in the 
copying process and to consider the effect that such 
processing stages can have on the data. Referring to so 
Figure 1, a watermarked image or sound data 10 is 
transmitted 12 to undergo typical distortion or inten- 
tional tampering 14. Such distortions or tampering 
includes lossy compression 16, geometric distortion 18, 
signal processing 20 and D/A and A/D conversion 22. 55 
After undergoing distortion or tampering, corrupted 
watermarked image or sound data 24 is transmitted 26. 
The process of "transmission" refers to the application 
of any source or channel code and/or of encryption 



techniques to the data. While most transmission steps 
are information lossless, many compression schemes 
(e.g., JPEG, MPEG, etc.) may potentially degrade the 
quality of the data through irretrievable loss of data. In 
general, a watermarking method should be resilient to 
any distortions introduced by transmission or compres- 
sion algorithms. 

Lossy compression 16 is an operation that usually 
eliminates perceptually irrelevant components of image 
or sound data. In order to preserve a watermark when 
undergoing lossy compression, the watermark is 
located in a perceptually significant region of the data. 
Most processing of this type occurs in the frequency 
domain. Data loss usually occurs in the high frequency 
components. Thus, the watermark must be placed in 
the significant frequency component of the image (or 
sound) data spectrum to minimize the adverse affects of 
lossy compression. 

After receipt, an image may encounter many com- 
mon transformations that are broadly categorized as 
geometric distortions or signal distortions. Geometric 
distortions 18 are specific to image and video data, and 
include such operations as rotation, translation, scaling 
and cropping. By manually determining a minimum of 
four or nine corresponding points between the original 
and the distorted watermark, it is possible to remove 
any two or three dimensional affine transformation. 
However, an affine scaling (shrinking) of the image 
results in a loss of data in the high frequency spectral 
regions of the image. Cropping, or the cutting out and 
removal of portions of an image, also results in irretriev- 
able loss of data. Cropping may be a serious threat to 
any spatially based watermark but is less likely to affect 
a frequency-based scheme. 

Common signal distortions include digital-to-analog 
and analog-to-digital conversion 22, resampling, 
requantization, including dithering and recompression, 
and common signal enhancements to image contrast 
and/or color, and audio frequency equalization. Many of 
these distortions are non-linear, and it is difficult to ana- 
lyze their effect in either a spatial or frequency based 
method. However, the fact that the original image is 
known allows many signal transformations to be 
undone, at least approximately. For example, histogram 
equalization, a common non-linear contrast enhance- 
ment method, may be substantially removed by histo- 
gram specification or dynamic histogram warping 
techniques. 

Finally, the copied image may not remain in digital 
form. Instead, it is likely to be printed or an analog 
recording made (analog audio or video tape). These 
reproductions introduce additional degradation into the 
image data that a watermarking scheme must be robust 
to. 

Tampering (or attack) refers to any intentional 
attempt to remove the watermark, or corrupt it beyond 
recognition. The watermark must not only be resistant 
to the inadvertent application of distortions. It must also 
be immune to intentional manipulation by malicious par- 



6 



11 



EP 0 766 468 A2 



12 



ties. These manipulations can include combinations of 
distortions, and can also include collusion and forgery 
attacks. 

Figure 2 shows a preferred system for inserting a 
watermark into an image in the frequency domain. 
Image data X(i ( j) assumed to be in digital form, or alter- 
natively data in other formats such as photographs, 
paintings or the like, that have been previously digitized 
by well-known methods, is subject to a frequency trans- 
formation 30, such as the Fourier transform. A water- 
mark signal W (k) is inserted into the frequency 
spectrum components of the transformed image data 
32 applying the techniques described below. The fre- 
quency spectrum image data including the watermark 
signal is subjected to an inverse frequency transform 
34, resulting in watermarked image data X (i, j), which 
may remain in digital form or be printed as an analog 
representation by well-known methods. 

After applying a frequency transformation to the 
image data 30, a perceptual mask is computed that 
highlights prominent regions in the frequency spectrum 
capable of supporting the watermark without overly 
affecting perceptual fidelity. This may be performed by 
using knowledge of the perceptual significance of each 
frequency in the spectrum, as discussed earlier, or sim- 
ply by ranking the frequencies based on their energy. 
The latter method was used in experiments described 
below. 

In general, it is desired to place the watermark in 
regions of the spectrum that are least affected by com- 
mon signal distortions and are most significant to image 
quality as perceived by a viewer, such that significant 
modification would destroy the image fidelity. In prac- 
tice, these regions could be experimentally identified by 
applying common signal distortions to images and 
examining which frequencies are most affected, and by 
psychophysical studies to identify how much each com- 
ponent may be modified before significant changes in 
the image are perceivable. 

The watermark signal is then inserted into these 
prominent regions in a way that makes any tampering 
create visible (or audible) defects in the data. The 
requirements of the watermark mentioned above and 
the distortions common to copying provide constraints 
on the design of an electronic watermark. 

In order to better understand the watermarking 
method, reference is made to Figures 3(a) and 3(b) 
where from each document D a sequence of values 

X=x^ x n is extracted 40 with which a watermark 

W=w 1 w n is combined 42 to create an adjusted 

sequence of values X'=x* 1 x' n which is then inserted 

back 44 into the document in place of values X in order 
to obtain a watermark document D\ An attack of the 
document D', or other distortion, will produce a docu- 
ment D\ Having the original document D and the docu- 
ment D*. a possibly corrupted watermark W* is 
extracted 46 and compared to watermark W 48 for sta- 
tistical analysis 50. The values W* are extracted by first 
extracting a set of values X*=x 1 * x n * from D* (using 



information about D) and then generating W* from the 
values X* and the values X. 

When combining the values X with the watermark 
values W in step 42. scaling parameter a is specified. 
5 The scaling parameter a determines the extent to which 
values W alter values X. Three preferred formulas for 
computing X' are: 

x'^Xj+aWj (1) 

w 

x'rxtf+awj) (2) 

, 5 x ) (3) 

Equation 1 is invertible. Equations 2 and 3 are 
invertibfe when Xj*0. Therefore, given X* it is possible to 
compute the inverse function necessary to derive W* 

20 from X and X*. 

Equation 1 is not the preferred formula when the 
values Xj vary over a wide range. For example, if Xj=10 6 
then adding 1 00 may be insufficient to establish a water- 
mark, but if Xj=10 ( then adding 100 will unacceptably 

25 distort the value. Insertion methods using equations 2 
and 3 are more robust when encountering such a wide 
range of values Xj. It will also be observed that equation 
2 and 3 yield similar results when awj is small. Moreo- 
ver, when Xj is positive, equation 3 is equivalent to 

30 ln(Xj)=ln(x j )+ax j and may be considered as an appli- 
cation of equation 1 when natural logarithms of the orig- 
inal values are used. For example, if | and a=0.01 , 
then using Equation (2) guarantees that the spectral 
coefficient will change by no more than 1%. 

35 For certain applications, a single scaling parameter 
a may not be best for combining all values of x s . There- 
fore, multiple scaling parameters a lt ...,a n can be used 
with revised equations 1 to 3 such as Xj=Xj (1+OjWj) . 
The values of a s serve as a relative measure of how 

40 much Xj must be altered to change the perceptual qual- 
ity of the document A large value for otj means that it is 
possible to alter Xj by a large amount without perceptu- 
ally degrading the document. 

A method for selecting the multiple scaling values is 

45 based upon certain general assumptions. For example, 
equation 2 is a special case of the generalized equation 
1, (Xj'=Xj+ ctjXj ), for aj=aXj. That is , equation 2 
makes the reasonable assumption that a large value of 
Xj is less sensitive to additive alteration that a small 

so value of x,. 

Generally, the sensitivity of the image to different 
values of otj is unknown. A method of empirically esti- 
mating the sensitivities is to determine the distortion 
caused by a number of attacks on the original image. 

55 For example, it is possible to compute a degraded 
image D* from D, extract the corresponding values 
Xt V..,x n * and select a t to be proportional to the devia- 
tion Ix^-Xj). For greater robustness, it is possible to try 
other forms of distortion and make oq proportional to the 
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average value of Ixf-xJ. Instead of using the average 
duration, it is possible to use the median or maximum 
deviation. 

Alternatively, it is possible to combine the empirical 
approach with general global assumptions regarding 
the sensitivity of the values. For example, it might be 
required that ocj^otj whenever x^Xj. This can be com- 
bined with the empirical approach by setting c<j accord- 
ing to 

<x ■ ~ max I v* - v.l 
{ylv.iCv,.} ' ' 



A more sophisticated approach is to weaken the 
monotonicity constraint to be robust against occasional 
outliers. 

The length of the watermark, n, determines the 
degree to which the watermark is spread among the rel- 
evant components of the image data. As the size of the 
watermark increases, so does the number of altered 
spectral components, and the extent to which each 
component need be altered decreases for the same 
resilience to noise. Consider watermarks of the form 
x^Xj+ctWj and a white noise attack by x^Xj'+r,, 
where r t are chosen according to independent normal 
distributions with standard deviation a. it is possible to 
recover the watermark when a is proportional to 




That is, quadrupling the number of components can 
halve the magnitude of the watermark placed into each 
component. The sum of the squares of the deviations 
remains essentially unchanged. 

In general, a watermark comprises an arbitrary 
sequence of real numbers W=w j ,...,w n . In practice, 
each value Wj may be chosen independently from a nor- 
mal distribution N(0,1), where N(\i, a 2 ) with mean \i and 
variance o 2 or of a uniform distribution from {1,-1} or 
{0.1}. 

It is highly unlikely that the extracted mark W* will 
be identical to the original watermark W. Even the act of 
requantizing the watermarked document for transmis- 
sion will cause W* to deviate from W. A preferred meas- 
ure of the similarity of W and W* is 



whatever value of W* is obtained, the conditional distri- 
bution on Wf will be independently distributed according 
to A/(0,1). In this case. 

5 n 2 

A/(0,£x )=N(0,W • W ). 

/=1 ' 

Thus, sim(l/V, W*) is distributed according to A/(0,1). 
w Then, one may apply the standard significance tests for 
the normal distribution. For example, if D* is chosen 
independently from W, then it is very unlikely that 
sim( W, H/*)>5. Note that somewhat higher values of sim 
( W, W*) may be needed when a large number of water- 

75 marks are on file. The above analysis required only the 
independence of W from W\ and did not rely on any 
specific properties of W* itself. This fact provides further 
flexibility when preprocessing W*. 

The extracted watermark W may be extracted in 

20 several ways to potentially enhance the ability to extract 
a watermark. For example, experiments on images 
encountered instances where the average value of W*, 
denoted E, (W*), differed substantially from 0, due to 
the effects of a dithering procedure. While this artifact 

25 could be easily eliminated as part of the extraction proc- 
ess, it provides a motivation for postprocessing 
extracted watermarks. As a result, it was discovered 
that the simple transformation wf <-i/v,-*-E,-( W*) yielded 
superior values of sim (W,W*). The improved perform- 

30 ance resulted from the decreased value of W* • w*\ the 
value of W* • W was only slightly affected. 

In experiments it was frequently observed that w f * 
could be greatly distorted for some values of /. One 
postprocessing option is to simply ignore such values, 

35 setting them to 0. That is, 



* J w* if |vv*|> tolerance 
1 I 0 otherwise 



45 The goal of such a transformation is to lower W* • 
W*. A less abrupt version of this approach is to normal- 
ize the W* values to be either - 1 ,0 or 1 , by 

50 w . <-sign(w -E f (W)). 



Large values of sim {W,W*) are significant in view 
of the following analysis. Assume that the authors of 
document D* had no access to W (either through the 
seller or through a watermarked document). Then for 



This transformation can have a dramatic effect on 
tr: 9 statistical significance of the result. Other robust sta- 
tistical techniques could also be used to suppress out- 
lier effects. 

In principle, any frequency domain transform can 
be used. In the scheme described below, a Fourier 
domain method is used, but the use of wavelet based 
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schemes are also useable as a variation. In terms of 
selecting frequency regions of the transform, it is possi- 
ble to use models for the perceptual system under con- 
sideration. 

Frequency analysis may be performed by a wavelet 5 
or sub-band transform where the signal is divided into 
sub-bands by means of a wavelet or multi-resolution 
transform. The sub-bands need not be uniformly 
spaced. Each sub-band may be thought of as repre- 
senting a frequency region in the domain corresponding w 
to a sub-region of the frequency range of the signal. The 
watermark is then inserted into the sub-regions. 

For audio data, a sliding "window" moves along the 
signal data and the frequency transform (DCT, FFT, 
etc.) is taken of the sample in the window. This process 15 
enables the capture of meaningful information of a sig- 
nal that is time varying in nature. 

Each coefficient in the frequency domain is 
assumed to have a perceptual capacity. That is, it can 
support the insertion of additional information without 20 
any (or with minimal) impact to the perceptual fidelity of 
the data. 

In order to place a length L watermark into an N x N 
image, the N x N FFT (or DCT) of the image is com- 
puted and the watermark is placed into the L highest 25 
magnitude coefficients of the transform matrix, exclud- 
ing the DC component. More generally, L randomly cho- 
sen coefficients could be chosen from the M, IvteL most 
perceptually significant coefficients of the transform. For 
most images, these coefficients will be the ones corre- 30 
sponding to the low frequencies. The purpose of placing 
the watermark in these locations is because significant 
tampering with these frequencies will destroy the image 
fidelity or perceived quality well before the watermark is 
destroyed. 35 

The FFT provides perceptually similar results to the 
DCT. This is different than the case of transform coding, 
where the DCT is preferred to the FFT due to its spec- 
tral properties. The DCT tends to have less high fre- 
quency information than that the FFT, and places most 40 
of the image information in the low frequency regions, 
making it preferable in situations where data need to be 
eliminated. In the case of watermarking, image data is 
preserved, and nothing is eliminated. Thus the FFT is 
as good as the DCT, and is preferred since it is easier to 45 
compute. 

In an experiment, a visually imperceptible water- 
mark was intentionally placed in an image. Subse- 
quently, 100 randomly generated watermarks, only one 
of which corresponded to the correct watermark, were so 
applied to the watermark detector described above. The 
result, as shown in Figure 4, was a very strong positive 
response corresponding to the correct watermark, sug- 
gesting that the method results in a very low number of 
false positive responses and a very low false negative 55 
response rate. 

In another test, the watermarked image was scaled 
to half of its original size. In order to recover the water- 
mark, the image was re-scaled to its original size, albeit 



with loss of detail due to subsampling of the image 
using low pass spatial filter operations. The response of 
the watermark detector was well above random chance 
levels, suggesting that the watermark is robust to geo- 
metric distortions. This result was achieved even though 
75 percent of the original data was missing from the 
scaled down image. 

In a further experiment, a JPEG encoded version of 
the image with parameters of 10 percent quality and 0 
percent smoothing, resulting in visible distortions, was 
used. The results of the watermark detector suggest 
that the method is robust to common encoding distor- 
tions. Even using a version of the image with parame- 
ters of the 5 percent quality and 0 percent smoothing, 
the results were well above that achievable due to ran- 
dom chance. 

In experiments using a dithered version of the 
image, the response of the watermark detector sug- 
gested that the method is robust to common encoding 
distortion. Moreover, more reliable detection is achieved 
by removing any non-zero mean from the extracted 
watermark. 

In another experiment, the image was clipped, leav- 
ing only the central quarter of the image. In order to 
extract the watermark from the clipped image, the miss- 
ing portion of the image was replaced with portions from 
the original unwatermarked image. The watermark 
detector was able to recover the watermark with a 
response greater than random. When the non-zero 
mean was removed, and the elements of the watermark 
were binarized prior to the comparison with the correct 
watermark, the detector response was improved. This 
result is achieved even though 75 percent of the data 
was removed from the image. 

In yet another experiment, the image was printed, 
photocopied, scanned using a 300 dpi Umax PS-2400x 
scanner and rescaled to a size of 256 x 256 pixels. 
Clearly, the final image suffered from different levels of 
distortion introduced at each process. High frequency 
pattern noise was particularly noticeable. When the 
non-zero mean was removed and only the sign of the 
elements of the watermark was used, the watermark 
detector response improved to well above random 
chance levels. 

In still another experiment, the image was subject 
to five successive watermarking operations. That is, the 
original image was watermarked, the watermarked 
image was watermarked, and so forth. The process 
may be considered another form of attack in which it is 
clear that significant image degradation occurs if the 
process is repeated. Figure 5 shows the response of the 
watermark detector to 1000 randomly generated water- 
marks, including the five watermarks present in the 
image. The five dominant spikes in the graph, indicative 
of the presence of the five watermarks, show that suc- 
cessive watermarking does not interfere with the proc- 
ess. 

The fact that successive watermarking is possible 
means that the history or pedigree of a document is 
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determinable if successive watermarking is added with 
each copy. 

In a variation of the multiple watermark image, five 
separately watermarked images were averaged 
together to simulate simple conclusion attack. Figure 6 5 
shows the response of the watermark detector to 1000 
randomly generated watermarks, including the five 
watermarks present in the original images. The result is 
that simple collusion based on averaging is ineffective in 
defeating the present watermarking system. 10 

The result of the above experiments is that the 
described system can extract a reliable copy of the 
watermark from images that have been significantly 
degraded through several common geometric and sig- 
nal processing procedures. These procedures include is 
zooming (low pass filtering), cropping, lossy JPEG 
encoding, dithering, printing, photocopying and subse- 
quent rescanning. 

While these experiments were, in fact, conducted 
using an image, similar results are attainable with text 20 
images, audio data and video data, although attention 
must be paid to the time varying nature of these data. 

The above implementation of the watermarking 
system is an electronic system. Since the basic princi- 
ple of the invention is the inclusion of a watermark into 25 
spectral frequency components of the data, watermark- 
ing can be accomplished by other means using, for 
example, an optical system as shown in Figure 7. 

In Figure 7, data to be watermarked such as an 
image 40 is passed through a spatial transform lens 42, 30 
such as a Fourier transform lens, the output of which 
lens is the spatial transform of the image. Concurrently, 
a watermark image 44 is passed through a second spa- 
tial transform lens 46. the output of which lens is the 
spatial transfer of the watermark image 44. The spatial 35 
transform from lens 42 and the spatial transform from 
lens 46 are combined at an optical combiner 48. The 
output of the optical combiner 48 is passed through an 
inverse spatial transform lens 50 from which the water- 
mark image 52 is present. The result is a unique, virtu- 40 
ally imperceptible, watermarked image. Similar results 
are achievable by transmitting video or multimedia sig- 
nals through the lenses in the manner described above. 

Claims 45 

1. A method of inserting a watermark into data com- 
prising the steps of: 

obtaining a decomposition of data to be water- so 
marked; 

inserting a watermark into the perceptually sig- 
nificant components of the decomposition of 
data; and 

applying an inverse transform to the decompo- ss 
sition of data with the watermark for generating 
watermarked data. 

2. A method of inserting a watermark into data as set 



forth in claim 1, said obtaining a decomposition of 
data being obtaining a spectral decomposition of 
data. 

3. A method as set forth in claim 1 or 2, where said 
data comprising image data, video data, audio data 
and/or multimedia data. 

4. A method as set forth in claim 2 or 3, where said 
obtaining a spectral decomposition of data is 
selected from the group consisting of Fourier trans- 
formation, discrete cosine transformation, Had- 
amard transformation, and wavelet, multi- 
resolution, sub-band method. 

5. A method as set forth in any one of claims 1 to 4, 
where said inserting a watermark inserts water- 
mark values so that addition of additional signal into 
a perceptually significant component affects the 
perceived quality of the data. 

6. A method as set forth in any one of claims 2 to 5, 
further comprising: 

comparing data with watermarked data for 
obtaining extracted data values; 
comparing extracted data values with water- 
mark values and data for obtaining difference 
values; and 

analyzing difference values to determine the 
watermark in the watermarked data. 

7. A method as set forth in claim 6, where the water- 
mark values are chosen according to a normal dis- 
tribution. 

8. A method of inserting a watermark into data com- 
prising the steps of: 

extracting values of perceptually significant 
components of a spectral decomposition of 
data; 

combining watermark values with the extracted 
values to create adjusted values; and 
inserting the adjusted values into the data in 
place of the extracted values to produce water- 
marked data. 

9. A method of inserting a watermark into data as set 
forth in claim 8, where the watermark values are 
chosen according to a random distribution. 

1 0. The method as set forth in any one of claims 6 to 9, 
where watermark values include associated scaling 
parameters. 

11. A method as set forth in claim 10, where scaling 
parameters are selected such that adding addi- 
tional watermark value affects the perceived quality 
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of the data. 

1 2. A method as set forth in any one of claims 8 to 11. 
further comprising: 

5 

comparing data with watermarked data for 
obtaining extracted data values; 
comparing extracted data values with water- 
marked values and data for obtaining differ- 
ence values; and 10 
analyzing difference values to determine the 
watermark in the watermarked data. 



tion of data to be watermarked with the compo- 
nents of the decomposition of the modified 
data; and 

inserting a watermark into the data to be water- 
marked based upon said comparing. 



13. A method of inserting a watermark into data as set 
forth in claim 1 2, further comprising the step of pre- is 
processing distorted or tampered watermarked 
data before said comparing data. 

14. A method of inserting a watermark into data as set 
forth in claim 13, where said distorted or tampered 20 
watermarked data is clipped data and said preproc- 
essing comprises replacing missing portions of the 
data with corresponding portions from original 
unwatermarked data. 

25 

1 5. A system for inserting a watermark into data com- 
prising: 



providing image data; 

providing watermark image data; 30 
first transform lens for transforming image data 
passing therethrough into transformed image 
data; 

second transform lens for transforming water- 
mark image data passing therethrough into 35 
transformed watermark image data; 
optical combiner for combining the transformed 
image data and the transformed watermark 
image data to form transformed watermarked 
data; and 40 
inverse transform lens for forming watermarked 
data by inverse transformation of transformed 
watermarked data. 



16. A system for inserting a watermark into data as set 45 
forth in claim 1 5, where said first transform lens and 
said second transform lens are Fourier transform 
lenses and said inverse transform lens is an inverse 
Fourier transform lens. 

50 

17. A method of inserting a watermark into data com- 
prising the steps of: 

obtaining a decomposition of data to be water- 
marked; 55 
modifying the data to be watermarked by sub- 
jecting the data to distortion and/or tampering; 
obtaining a decomposition of the modified data; 
comparing the components of the decomposi- 
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